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Abstract. In this paper, we propose a new operation, Generalised Sequential Crossover (GSCO) of words, 
which in some sense an abstract model of crossing over of the chromosomes in the living organisms. 
We extend GSCO over language L iteratively (GSCO* (L) as well as iterated GSCO over two languages 
GSCO* (Li, 1/2))- Our study reveals that GSCO* (L) is subclass of regular languages for any L. We compare 
the different classes of GSCO languages with the prominent sub-regular classes. 

1 Introduction 

Self-assembly is a process in which smaller objects selectively aggregate with each other into a complex structure, 
which in turn self assemble into larger aggregates. It is a process wide spread in nature - atoms self assemble into 
molecules, molecules into crystals, cells into tissues, etc. It is an important tool in nano-technology, since it takes 
nature as a model and tries to assemble structures from the atomic level (bottom- up approach). Self-assembly 
is considered as a promising technique in nano-technology, enabling the fabrication of small complex objects - 
such as computer circuits. 

A particular case of self assembly is that of a linear self assembly, in which one dimensional objects such as 
DNA double strands interact with each other to form longer strands. DNA recombination is one such DNA self 
assembly by which Adleman solved an instance of Hamiltonian path problem [1]. For more than a decade now, 
self assembly is the core of most experiments in DNA computing starting with the celebrated experiment of 
Adleman [1, 10, 22]. Recent developments in DNA computing have highlighted the intimate connection between 
self assembly and computation. Computational utilities of DNA self assembly is studied in [27]. 

Most complexity theoretic studies of self assembly utilise mathematical models. Some alternate models, like 
self assembly of the objects by the use of capillary force, electrostatic force, and magnetic force were also studied. 

In recent years, one can see convergent interests in the study of self assembly from Mathematics, Computer 
science, Physics, Chemistry, and Biology point of view. Yet the mechanisms of these processes are so far lit- 
tle understood and pose a formidable challenge. Attempts were made to study the self assembly in different 
frameworks like 'tile based self-assembly' [5, 16, 27-29]. Perhaps the best model for self assembly was proposed 
by [29]. With an aim of making the process of self assembly more clear, studies of abstract models, such as 
self assembly of strings was initiated [7]. In [4] authors introduce an operation among strings and languages, 
called "superposition" , which is similar to the Csuhaj-Varju's operation called self assembly on strings, but their 
approaches are different. 

Inspired by the different models of self-assembly, in particular the string self assembly of Csuhaj-Varju [7], we 
planned to propose a string based operation which may be a generalisation of self-assembly operation proposed 
in Csuhaj-Varju's paper [7]. In Csuhaj-Varju's model, two strings uv and vw self assemble over v and generate 
uvw. Here v is the overlapping string. Then comes the question : What will be the process if we do not restrict 
the overlapping string to be in the end of the first string and in the beginning of the second string. As an 
answer to the above question we propose a new operation on two strings. Two strings u±xvi and U2XV2 self 
assemble over the substring x (also called overlapping string, i/e) and generate the strings u\xv^ and uixv\ 
as illustrated in figure 1. 

Normally, in any self-assembly process, no portion of the components (that take part in the self-assembly) 
should be lost. In that sense, our new operation on strings (where some portions of the strings are lost) can no 
longer be called as the abstraction of the self-assembly process. 

But, our operation resembles in one sense, the recombination process of chromosomes by exchanging the 
segments between homologous chromosomes, called crossing-over. A chromosome is a single piece of DNA 
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«2 W'2 — U2XV2 ^ v 2 



Fig. 1. A scheme for crossover of two strings 



that contains many genes, regulatory elements and other nucleotide sequences. Each gene occupies a well- 
defined site or locus in its chromosome, having corresponding locations in the pair of homologous chromosomes. 
Chromosomal cross over (or crossing over)is the process by which two chromosomes pair up and exchange their 
DNA. Crossover usually occurs when matching regions on matching chromosomes (homologous chromosomes) 
break and then reconnect to the other chromosomes. The result of this process is an exchange of genes, called 
genetic recombination, which leads to the genetic variability. Crossover can occur at one or more points along 
the adjacent chromosomes. 

In [20], an operation on strings and languages having the same feature is introduced. Every chromosome is 
considered as a string. The operation is applicable to a pair of strings of equal length as the crossing over is 
between the homologous chromosomes. 

Each string is cut in several fragments, but in the sites for both of them and crossing these fragments by 
ligases. A new string, of the same length, is formed by starting at the left end of one parent, copying a segment, 
crossing over to the next site in the other parent, copying a substring, crossing back to the first parent and so 
on until the right end of one parent is reached. Obviously, another new string can be obtained by starting with 
the other parent. This crossover operation [20] among the strings is similar to the chromosome crossing-over. A 
generalisation of the splicing system is proposed in [21]. 

Our proposal, two strings uixvi and u 2 xv 2 overlap at the substring x and generate the strings u\xv 2 and 
U2XV1, differs with the cross-over operation in two aspects. First, in our model, words of different lengths can 
participate in a crossover. Second, crossing over occurs at only one site between the words. For these reasons, 
we call our operation as Generalised Sequential Cross Over (GCSO). We use the adjective generalised in the 
sense that crossover can occur between any two words of any length and the adjective sequential in the sense 
that the crossover occurs between any two words at only one point (site) in contrary to the occurrence at one or 
more points between the chromosomes. 

Any two strings may share more than one common overlap and so the result of GSCO of two strings is in 
general a set of strings. As usual in formal language theory, we extend GSCO to a language, iterated version of 
GSCO over a language. 

Our study answers several questions in the sense of nano-scale fabrication; like - can we decide if a given 
language can be obtained by iterated GSCO and if so can we effectively construct a minimal finite set of initial 
strings. Given such a finite set of strings, what language can be generated by the GSCO? 

Though the operation GSCO is just an abstraction of the crossover operation introduced in [20], our study 
reveals many interesting results such as: iterated GSCO of any language will always be regular, and a subclass 
of GSCO languages matches exactly with the strictly locally testable language(SLT) [18] leading to a new 
characterisation of SLT language using iterated GSCO. 

Section 2 deals with the preliminaries required for this paper. Section 3 introduces the GSCO operation on 
words and languages along with some basic results. Section 4 discusses a variant of GSCO. Section 4 shows that 
the operations 1-GSCO and 2-GSCO over a language L are the same. Two types of iterations are defined for 
GSCO and their equivalence is discussed in section 5. Section 6 discusses the regularity of GSCO languages. 
Section 7 compares the GSCO languages with the other regular subclasses. 



2 Preliminaries 

Throughout this paper, we assume that the reader is familiar with the fundamental concepts of formal language 
theory and automata, i.e. notations of grammar and finite automata [14]. We list here some notations and 
notions we use in this paper. 
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2.1 Basic notations of formal language theory 

An alphabet is always a finite set of letters denoted by E. The set of all words over an alphabet E is denoted 
by E* . The empty word is denoted by e. Further E + = E*\e. Given a word w, the number of symbols in w is 
the length of the word and is denoted by \w\. A word v is a sub-word (in literature, it is also called as factor) of 
a word 10 if there are words U\ and u 2 (possibly empty) such that w — uivu 2 . v is called prefix of w (Prcfix(w)) 
if w = vu. Similarly v is called the suffix of w (Suffix(io)) if w = uv. Prefix(i) = {Prcfix(w) : w G L} and 
Suffix(L) = {Suffix(io) : w G L}. The notation E x means the set of symbols of E that occurs in the word x. 
u x means the word u which is a sub-string of a word x. \u\ x is the number of occurrence of x in it. For a fixed 
x (which is a sub-string of it), \u\ x is the total number of the occurrence of x to the right of x. We define a 
function I x over the sub(x) such that, 

I x : sub (a;) ► N 

I x (u) = \u\ x - \u\ r x . 

The class of regular language is defined by REG. Every finite automaton induces a right invariant equivalence 
relation defined on the set of input strings which is formalised in the following theorem (see [14]) 

Theorem 1 (Myhill-Nerode). The following statements are equivalent. 

1. The set L C E* is accepted by some finite automaton. 

2. L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index. 

3. Let equivalence relation Rl be defined by xRlV if and only ifVz G E* , xz G L exactly when yz G L. Then 
Rl is of finite index. 

2.2 Splicing 

A splicing rule (over alphabet E) is a quadruple (ui, u 2 , u 3 , u±) of words U\,u 2 , 1*3,1*4 G E* which is often 
written as follows: uif^u 2 %u^u^. Here # and $ are splicing symbols which are not in E. A splicing rule 
r = ui#u 2 $u 3 #U4 is applicable to two words x = x\U\U 2 x 2 and y = y\ Uj,u^y 2 . The splicing of the words x and 
y by the splicing rule Iti#it2$ii3#it4, produces two new words w\ = x\U\U\y 2 and w 2 = y\u^u 2 x 2 . In this case 
we write (x,y) h r (wi,w 2 ). This operation is also called 2-splicing. We can take only w\ as a result instead of 
both of them. In that case the corresponding operation is called 1-splicing and is denoted by (x,y) h w\. 

A pair a = (£, R) where S is an alphabet and R is a set of splicing rules is called a splicing scheme or a 
fZ-scheme. For an f/-scheme a = (£, R) and a language L C Z", we define 

a(L) = {wi,w 2 G S*\x,y G L, r G R, (x,y) h r 101,102} 

where x, y, wi,w 2 and r are specified above. The iterative version of the splicing operation is defined as 

<t°(L) =L 
a l+1 (L) =a l (L)Ua(a'(L)) 
cT*(L) = \J i > * i (L) 

iJ-system is a construct H = (£, A, R) where £ is a finite alphabet, A C U* is a set of initial words over 
E, called axiom and R C E*#E*%E*#E* is a set of splicing rules. The language generated by H = (E, A, R) 
is a* (A). Thus the language generated by the 77-system is the set of all words that can be generated starting 
with A, as initial words and by iteratively applying splicing rules from R to the words already generated. 

A iJ-system is called a 'null context i/-system' (NCH) if R is a finite subset of E*. The language generated 
by NCH is the smallest language L in E* that contains A and has the property that whenever strings wrx 
and yrz are in L, r G R; the strings wrz and yrx are also in L. A language L is called a null context splicing 
language (NCif-language) if there exists a null context splicing system that generates L [12]. Simple f/-systcm 
[19] is a ii-system (E, A, R), where R C E such that for x,y,z G E*aiid a G i?; (x,y) h a z if and only if 
x = x\ax 2l y = y\ay 2l z = x\ay 2l for x\, x 2 , yi, t/2, a £ E* . The family of simple ff-systems is a subclass of 
NCH systems. SH is the family of languages generated by a simple splicing system. 
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2.3 Constant 

The concept of a constant, as introduced by Schutzenberger [26] is a valuable conceptual tool for splicing theory, 
given out many years before the proposal of the theory of splicing. A string c G S* is a constant for a language 
L over an alphabet S if, whenever wcx and ycz are in L, both wcz and ycx are also in L. A string y is a factor 
of a string w if w = xyz for some x,y E S* and that y is a factor of a language L if y is a factor of some string 
in L. Further each rule of a NCH system G is necessarily a constant for the language L(G). 

2.4 Strictly locally testable languages 

The concept of strictly locally testable languages was introduced by McNaughton and Papert in [18]. Later, Dc 
Luca and Restivo [17] gave a characterisation for such languages, using the concept of constants [26]. We give 
the definition of strictly locally testable languages as in [18] and the characterisation of it as in [17]. 

Definition 1 [18] A subset Xof A + is called strictly locally testable if a positive integer k and three subsets 
U, V, Wof A k exist such that:X n A k A* = (UA* n A*V) \ A*WA* . 

Class of strictly locally testable languages is denoted by SLT 

Definition 2 Characterisation of SLT [17]: A Language L is a SLT if there is a positive integer k for which 
every factor of L of length k is a constant. 

3 Generalised Sequential Crossover 

Definition 3 Generalised sequential crossover scheme GSCO = (S, R), where S is the finite alphabet, R C S* 
be the finite set of overlapping strings; we write GSCO = (S,R) as GSCOr. GSCOr is also called a R- 
crossover. When R is singleton, say R = {x}, we write GSCO x instead of GSCOr. 

For a given GSCO scheme GSCO and two words W\ = U\XV\ and u>2 = u^xv^ £ H* , we define 

GSCO x {w\,W2) = {u\xv2,U2XV\ G U* : w\ = u\xvi,W2 = u 2 xv 2} e ^ x e R}. 
The scheme is shown in figure 1. 

Instead of writing GSCO x (uixvi,u 2 xv2), we also write U\XVi >^< u 2 xv 2 = {uxxv 2l u 2 xv i}, which means 
that the two strings u\xv\ and u 2 xv 2 crossover over the sub-string x to generate two new words u\xv 2 and 
U2XV1. We also write U\xvi >^< u 2 xv 2 = {uixv 2 , u 2 xvi} instead of (uixvi,u 2 xv 2 ) >^< {uixv 2 , u 2 xvi}. Then 

GSCOr{ Wi ,w 2 ) = |J W\ >^< w 2 . 

xeR 

Obviously R should contain words which are sub- words found in both ui\ and w 2 , otherwise GSCOr{w\,w 2 ) 
will be empty. We call the operation GSCO x , x e £ as the symbol overlapping GSCO. Similarly we call 
GSC0 X1 x G S* as the string overlapping GSCO. Let sub(w) be the set of all sub- words of w. If in a GSCO 
scheme R = sub(«;i) fl sub(w 2 ), we simply write GSCO(wi,w 2 ), i.e. GSCO(wi,w 2 ) is the set of all words that 
can be generated by the GSCO of W\ and w 2 with all possible overlapping. In other words, 

GSCO(wi,w 2 ) = [J GSCO x (101,102), x G sub(w;i) nsub(w 2 ). 

X 

We do not crossover two strings with e as the overlapping string. 1 
1 Out of curiosity we record the result 



GSCO e (wi,w 2 ) = Pref(wi).Suff(w 2 ) U Pref (w 2 ).Suff(wi). 
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We extend the above definition to languages. Given any two languages L\ and L 2 over the alphabet E\ and 
£2 respectively such that S\ fl £ 2 ^ 0, we define 

GSCO R (L u L 2 ) = |J GSCO R (w u w 2 ). 

wieLi 
W2GL2 

Here the underlying crossover scheme is GSCO — (Si U £ 2 ,R). As mentioned earlier, when R = sub^) n 
sub(X 2 ) (R is the set of all possible overlapping between a word of L\ and a word of L 2 . 

GSCO R {L 1 ,L 2 ) = (J GSCO(w u w 2 ). 

u>iGLi 
W2GL2 

GSCO{L,L) is written as just GSCO(L). 

We record some results, whose proofs are immediate. 

Proposition 1 Let u,v £ S* . 

1. GSCO x (u, v) — GSCO y (u, v) where the sub-word x occurs in y only once and no two symbols of x are same. 

2. GSCO x (u,v) D GSCO y (u,v), x is a sub-word ofy. 

3. IfRCR', GSCO R (u,v) C GSCO R ,(u,v). 

I GSCO Rl uk 2 («,«) = GSCO Rl [u,v)U GSCOr 2 (u,v). 

5. GSCO Rl nR2 (u,v) = GSCO Rl (u, v) n GSCOr 2 (u,v). 

6. GSCO a (GSCO a (u,v),u) = GSCO{u,GSCO a (u,v)), aeS. 

7. GSCO a (GSCO a (u,v),v) = GSCO{v,GSCO a (u,v)), a££. 

8. If GSCO a (u,v) — {x,y}, then GSCO a {x,y) = {u,v}, i.e. the operation GSCO a , Va £ £ is reversible. 

9. The length of the words in GSCO{u, v) will range form 1 to \u\ + \v\ — 1. 

10. GSCO(w,w) — w if no two symbols of w are same. 

11. GSCO operation is not associative over words, but commutative over words. In fact GSCO(Li, L 2 ) = 
GSCO^Li). 

12. GSCO(a\ai) = {a, a 2 ,-" ,a i +^ 1 }. 

13. For any two languages, L\ and L 2 

GSCO(L! U L 2 ) = GSCOiLi) U GSCO(L 2 ) U GSCO{L 1 ,L 2 ). 

11 GSCO(w,w R ) = {uau R : u £ Prcfix(w), a £ £}. 

15. For any two words Wi,w 2 £ £* , and x £ sub(wi) nsub(to 2 ), 

GSCO x (w u w 2 ) C GSCO aeEx { Wl ,w 2 ). 

If a word is generated by a string overlapping (x overlapping) GSCO of w\ and W2, then the word can also 
be generated by a symbol ( that occurs in x) overlapping. 

Proof. All but the last of the above statements follows directly from the definition. We only prove the last one 
(statement 15). Let u £ GSCO x {w\, W2), x £ sub(wi) n sub(w 2 ). If x £ £, then the proof is immediate. 
Let x £, \x\ > 2. Let x = aia 2 ■ ■ ■ a n , where some a^'s may be same. Suppose w\ — uiaia 2 ■ ■ ■ a n vi, w 2 = 
u 2 aia 2 ■ ■ ■ a n V2- Then u £ {uiaia 2 • • • a n v 2 , u 2 &ia 2 • • • a n v{\. 

Case I u = u\a\a2 ■ ■ ■ a„t> 2 . 

u £ GSCO at (wi,w 2 ),i £ {1,2,- •• ,n} ^ u £ GSCO ae s x (wi,w 2 ). 

Hence GSCO x {w u w 2 ) Q GSCO Ex {w u w 2 ). 
Case II 11 = u 2 aia 2 • • • a n v\. We get the result similarly. Hence the proof. 

Note 1. The other way of the statement 15 is not true, i.e. 

GSCO aeEx { Wl ,w 2 ) % GSCO x {w u w 2 ). 

As an example: GSCO a ba{c\abac2, dia6ad 2 ) — {cia6ad 2 , c?ia6ac 2 }. But GSCO a {c\abac2, diafea<i 2 ) = 
{ciabad2, d\abac2, Ciad 2 , d\ababac2, Ciababad2,diac 2 }. 
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Example 1. GSCO({a n : n > 1}) = a+. 

Example 2. GSCO{{a n b n : n > 1}) = a+b+ . 

Example 3. Let L = {ab, 6a, bb}. 

ab X a6 = {a6} a6 X 6a = {a, 6, 6a6, a6a} 

ab X 66 = {a6, 66, 6, a66} 6a X 66 = {6, 66, 6a, 66a} 

6a X 6a = {6a} 66 X 66 = {66}. 

So we have 

GSCO(L) = {a, 6, a6, 6a, 66, a6a, 6a6, a66, 66a}. 
Example I GSCO({a,b}) = {a, 6}. 

Example 5. GSCO({abcab,c}) — {ab,abc,cab,abcabcab}. 

In computing GSCO(w\,w 2 ), one has to first compute all the common sub-strings x and compute (J GSCO x (wi, 
w 2 ). For GSCO(L) we have to compute (J tu 2 eL GSCO(w\, w 2 ). In short, 

GSCO(L)= (J \jGSCO(w 1 ,w 2 ), x G sub(ioi) n sub(io 2 ), 

Wi,W2GL X 

which increases the complexity of the computation of GSCO. We have the following theorem to reduce this 
tedious calculation of finding all the common sub-strings of all the pairs of words of a given language L. 

Theorem 2. Letw 1 ,w 2 G S* . 

GSCO( Wl ,w 2 )= (J GSCO a ( Wl ,w 2 ). 

Proof. Since 

GSCO( Wl ,w 2 )= [J GSCO x {w u w 2 ), 

it is enough if we prove that: 

(J GSCO x (w u w 2 ) = |J GSCO a ( Wl ,W2). 

a?Gsub(^i)nsub(^2) a££ wi n£ W2 

Since n Z , U)2 C sub(wi) fl sub(w2), 

GSCO Swi ns W2 {wi,w 2 ) C GSCOgub^^nsub^)^!,^). 

|J GS<70 o (t«i,itf2)C (J GSCO x (w!,W2). 

To prove the other way, let u G GSCO x (w\, w 2 ). If x G nZ' tU2 , then the proof is obvious. Suppose \x\ > 2 
(i.e. x is a common sub-string of w± and u>2)- By the result 15 of proposition 1, there exists a symbol in x, say 
a, (i.e. a G Z x ) such that u G GSCO a (wi,w 2 ). Since a G S x , x G sub(wi) n sub(u;2), we have a G S Wl n Z 1 ^. 
This implies u G GSCO a ^s ns W2 (w\, w 2 ). Hence 

GSCO x (w u w 2 ) C G^O aG ^ in ^ 2 K,w 2 ). 
=>• U GSCO x (t«i,i«2)C |J GSCO„ (wi,t«2). 
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Corollary 1. GSCO(w u w 2 ) = {J aeS GSCO a (w 1 ,w 2 ). 
Proof. It is enough if we prove that 

|J GSCO a ( Wl , w 2 ) = |J GSCO a (101,102). 

The alphabet 

r = (r m nrjuA (l) 

where A contains the symbols of 17 which are not in U Wl n 17 W2 , i.e. the alphabet S can be written as a disjoint 
union of the two sets with respect to the words w\ and w 2 . 

(J GSCO(w u w 2 ) = \J GSCO( Wl ,w 2 ) = $. (2) 

By result 4 of proposition 1, (1) implies 

|J GSCO a ( Wl ,w 2 ) = ( |J GSCO( Wl , W2 ))|J( (J GS<70 o€ A(tui,u>2)) 
oes aei; roi nx' ro2 aeA 

=> \J GSCO a ( Wl ,w 2 ) = |J GSCO( Wl ,w 2 )). 
Hence the proof. 

Corollary 2. GSCO(L) = \J Wl , W2 eL Us* GSCO a (w u w 2 ). 
Proof. 

GSCO(L)= \J GSCO( Wl ,w 2 ) 

= U U GSCO a ( Wl ,w 2 ). 

This corollary tells us that to compute GSCO(L) it is enough to compute the GSCO of w\ and w 2 over the 
symbols of the alphabet S and take the union of all those GSCO(wi,w 2 )'s. 



3.1 CGSCO 

We mention a special type of the operation GSCO viz., Corresponding GSCO (CGSCO). 

Definition 4 (CGSCO) Given any two words W\ 1 w 2 , and let x be a common sub-string of them such that in 
both wi and w 2l x occurs more than once. We crossover wi and w 2 in such a way that the first occurrence of 
x in u>i overlaps with the first occurrence of x in w 2 (second occurrence in wi crossover with second occurrence 
of x in w 2 and so on). We call such a GSCO as Corresponding GSCO. 

As an example CGSCO(abcab, abab ) = {ab,abab,abcab}. The sub-strings which occurs in both the strings 
more than once are ab, a, b. Here we do not allow the overlap of the first occurrence of ab in abcab with second 
occurrence of ab in abab. 

As seen in proposition 1, result 15 we have 

GSCO x (wi,w 2 ) C GSCO aeS=c (wi,w 2 ). 

There are some GSCO's for which the equality holds; i.e. for every symbol overlapping GSCO of wi and w 2 , 
there exists a string overlapping GSCO of wi and w 2 . If x is a common sub-string in wi and w 2 , then any sub- 
string of x is also a common string, GSCO can occur by the overlapping of the sub-string of x also. Result 1 of 
the proposition 1 tells that GSCO x (wi, w 2 ) D GSCO y (wi,w 2 ) where x C y. To compute the GSCO(wi,w 2 ) we 
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have to consider all the possible common sub-strings. But for the GSCO systems, which satisfies the property 
GSCO ae s x (wi, W2) = GSCO x (wi, W2), of theorem 2. To calculate GSCO{w\,W2) it is enough to compute 
GSCO ae z x (wi, W2) where x is the maximal common sub-string of w\ and W2 (A common sub-string x is said 
to be maximal if there is no common sub-string y such that a; is a sub-string of y), i.e. 

GSCO{ Wl ,W2) ={jGSCO aeSx (w u W2), 

x 

where x is the common maximal sub-string of w\ and ui 2 . 
Theorem 3. A GSCO is a CGSCO if and only if 

GSCO x {w u W2) = {jGSCO a]x (w u W2) 

Here a x is any symbol from the sub-string x such that I Wl {x) = I W2 {x). GSCO a \ x is an operation where the 
overlapping occurs over a which is a sub-string of x ans not elsewhere. 

Proof. Let the GSCO be a CGSCO. Let w\ and W2 be any two words. Let a; be a common sub-string of w\ and 
W2- \wi\x — \w2\x — n. Since the GSCO is a CGSCO, w\ and W2 can crossover over x only for n times. Let x 
occurs n times in w\ and m times in u» 2 . 

In the calculation of GSCO x {w\ 1 W2) we have to consider all the possible overlapping of x, i.e. any x in W\ 
can overlap with any x in W2- Let x = ai<2 2 ■ ■ ■ a kl w\ = u\xu2xu 3 ■ ■ ■ xu n+ i, W2 — V\XV2XV 3 ■ ■ ■ xv m+ i. We have 
assumed that GSCO is a CGSCO. Moreover, we have to consider such x overlapping such that I(x Wl ) — I(x W2 ), 
i.e. we calculate GSCO x (w\ 1 W2) when the ith occurrence of x in w\ overlaps with the ith occurrence of x in 
u>2- In such a case, 

GSCO x (wi, w 2 ) = {u 1 xv 2 xv 3 ■ ■ ■ xv n+ i,vixu 2 xu 3 ■ ■ ■ xu n+1 ; ■ ■ ■ ;u 1 xu 2 xv 3 ■ ■ ■ u n xv n+1 ,v 1 xv 2 xv 3 ■ ■ -v n xu n+1 }. 

(3) 

We consider the sub-string x in w\ and sub-string x in W2 such that I(x Wl ) — I(x W2 ). This means if we consider 
x which occurs ith time in u>\ we have to crossover it with the ith occurrence of x in w 2 as a sub-string. 

Consider x such that I Wl (x) — I W2 (x) = 1, i.e. the x which occurs first time in wi as well as in w 2 .Let a be 
any symbol in the sub-string x, a = ai say. 

By hypothesis GSCO is a CGSCO. We compute CGSCO a \ x {w\, w 2 ). CGSCO a \ Xi (wi, W2) means that the 
overlapping occurs between the ith occurrence of a in x which occurs in w\ and the ith occurrence of a in x 
which occurs in u> 2 . In our case, if a is the ith symbol in x Wl then a is also the ith symbol in x W2 . a can occur 
many times in x, but the overlapping of a has to take place in the corresponding position for CGSCO a \ x . 

CGSCO a \ x (wi,w 2 ) = CGSCO a \ x {uiai ■ ■ ■ a k u 2 x ■ ■ ■ xu n+1 , v^x ■ ■ ■ a k v 2 x ■ ■ ■ xv n+1 ) 
= {uiai ■ ■ ■ aiai +1 a k v 2 x ■ ■ - xv n+ i,v\ai ■ ■ ■ aiai + ia k u 2 x ■ ■ -xu n+1 } 

= {U1XV2X • ■ • XV n+ i 1 V\XU2X ■ • ■ XU n +\.} 

If a = a,j, and the crossover occurs over aj in x Wl and The calculation is similar, and we get 

CGSCO a \ x {wi,w 2 ) = CGSCO a \ x (uiai ■ ■ ■ a k u 2 x ■ ■ ■ xu n+ i, v^ai ■ ■ -a k v 2 x-- -xv n+1 ) 

= {uiai • • • ajOj + ia k V2X • ■ ■ xv n +i,v\a\ ■ • ■ ajOj + ia k U2X ■ ■ ■ xu n+ i} 

= {U1XV2X ■ • ■ XV n +\, V1XU2X ■ • ■ xu n +i.} 

It does not matter, how many times a is repeated in x, as the crossover is taking place on its position of 
occurrence (in the sub-string x of both the words) only. 

We repeat the case I for x such that I(x Wl ) — I(x W2 ) = 2. Arguing on similar line, 

GSCO a \ x {w\, W2) — {uixu 2 xv 3 ■ ■ ■ xv n+ i,vixv 2 xu 3 ■ ■ ■ xu n+1 }. 

Similarly we have 

GSCO a \ x {w\, W2) — {uixu 2 x- ■ -UiXVi+i ■ ■ ■ xv n+ i,v\XV2X ■ •■ ■ xui + i ■ ■ ■ xu n+ i} where I(x Wl ) = I(x W2 ) = 2. 
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So we have 

j 

|J GSCO a \ x (w 1 ,W 2 ) = {tb\XU 2 X ■ ■ ■ UjXV j+1 ■ ■■XVn+i, 

Iw 1 (x)=I W2 (x) (4) 

=1 

v 1 xv 2 x---v j ■ ■■xu j+1 ■ --xu n+1 : j = 1, 2, • • • min{|w 1 | a; , |w 2 |x}}- 

By (3) and (4), we have the claim. 
Given 

3 

GSCO x (w u w 2 ) = U GSCO a -. x (Wl ,w 2 ), 

I(x wl )=I(x W2 ) 
=2 

to show that GSCO is a CGSCO. 

Let the above claim be not true, i.e. GSCO is not a CGSCO. Choose w\ — u\xu 2 xu^ ■ ■ ■ xu n+ i and w 2 = 
V1XV2XV3 ■ ■ ■ xv n+ \. As we have noted earlier, the number of ^-overlapping for a CGSCO depends on the minimum 
number of occurrences of x in the two words to be self-assembled. Hence, without loss of generality we may 
assume that both of them has the same number of sub- word. 

When the first occurrence of x in w\ overlaps with the third occurrence of x in w 2 , we get two new words 

• • • V n+ i,ViXV 2 XV3XU 2 XU3 ■ ■ ■ u n+ i e GSCO x (wi,w 2 ). (5) 

The above strings can only be generated by GSCO a \ x (w\, w 2 ) where I Wl (x) — 1 and I W2 (x) — 3. It can not be 
generated by GSCO a \ x {w\,w 2 ) where Iw\{x) = I W2 (x). Hence 

3 

uixv 4 xv 5 ■ ■ ■v n+ i,vixv 2 xv 3 xu2xu 3 ■ ■ -u n+ i £ [J GSCO a \ x {w u w 2 ). (6) 

Iwi (x)=I u , 2 (x) 

=1 

(5) and (6) contradicts our hypothesis. Hence the GSCO is a CGSCO. 

Corollary 3. CGSCO x (wi,w 2 ) — CGSCO y \ x {wi,w 2 ), where y is a sub-string of x. 

Proof. The argument follows in the same line as in the previous theorem. Since we are dealing with a CGSCO; 
the first x of wi will match with the first x of w 2 . Again in this also 



4 1-GSCO and 2-GSCO 

In the theory of splicing, two types of splicing operations have been considered: the 1-splicing operation, when 
by applying a rule on two words, only one word is generated/considered; and the 2-splicing when both the two 
words are generated/considered. 

In a similar line we introduce two operations: 1-GSCO and 2-GSCO. The operations GSCO over the words w\ 
and w 2 generate two new words, each time when W\ and w 2 overlap over a common sub-string x. For a common 
sub-string x, different overlaps are also possible. Collection of all such words is denoted by GSCO x (wi 1 w 2 ). 
GSCO(w 1 ,w 2 ) is a collection of all possible GSCO x (w\, w 2 ys. GSCO(L) is the collection of all GSCO(w\, w 2 ys 
for all possible pairs of wi,w 2 G L. Hence, the operation GSCO is made up of many 'overlapping', with each 
overlapping generating two words. 

The operation GSCO is called 1-GSCO if in all the concerned overlapping, we consider the word which has the 
prefix of the first word and the suffix of the second word as the only word generated. So 1GSCO x (uixvi 7 u 2 xv 2 ) = 
{uixv 2 } } i.e. the operation 1GSCO generates only one word. We denote 1GSCO by >t<- 

The operation GSCO is called 2GSCO if in all the concerned overlapping we consider both the words 
generated. So the operation 2GSCO coincides with GSCO. 

Lemma 1 For the two words Wi, w 2 
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1GSCO x (uixvi, U2XV2) = {U1XV2} 



Fig. 2. A scheme for 1GSCO of two strings. Scheme for the output string U1XV2 is prominently shown. The grey part is 
the discarded self-assembled string. 



1. 1GSCO x (wi,w 2 ) = IGSCO x (w2,wi), if and only of w\ = w 2 . 

2. lGSCO(w u w 2 ) Q2GSCO{ Wl ,w 2 ). 

3. 1GSCO(w 1 ,w 2 )U1GSCO(w2,w 1 ) = 2GSCO(w u w 2 ). 

For any two languages Li and L 2 ; 

I lGSCO{L u L 2 ) C2GSCO(L 1 ,L 2 ). 
5. lGSCO{L) = 2GSCO(L) = GSCO{L). 

Proof. The results 1, 2, 3 and 4 are obvious. We prove the result 5. When the language L is a singleton set, 
1 - GSCO(L) = 2- GSCO{L). 

1GSC0(L)= (J lGSCO({ Wl ,w 2 }) 

= (J (1GSCO(w 1 ,w 2 )UIGSCO(w 2 ,w 2 )UIGSCO(w 1 ,w 1 )U1GSCO(w 2 ,w 2 )) 

Wl,W2&L 

= (J (2GSCO(w 1 ,w 2 )U2GSCO{w 1 ,w 1 )U2GSCO(w 2 ,w 2 )) 

= |J 2GSCO({ Wl ,w 2 }) 

= 2GSCO(L). 

Since 2GSCO(L) is just GSCO(L) we have the result. 

In case of finite H-system 1-splicing operation is more powerful than 2-splicing. In GSCO system they 
coincide. By the result 5 of Lemma 1, to calculate GSCO(L) it is enough to calculate lGSCO(L), which is 
equivalent to GSCO(L). From now onwards GSCO(L) means either lGSCO(L) or 2GSCO(L). 

5 Iterated GSCO 

Definition 5 Given a language L, we define the language obtained from L by unrestricted iterated application 
of GSCO. This language, called the unrestricted GSCO closure of L, denoted by uGSCO* (L), is defined as 

uGSCO°(L) = L 
uGSCO l+1 {L) = uGSCO l {L) U uGSCO{uGSCO l {L)) 
uGSCO*{L) = (J uGSCO\L) 

i>0 

Clearly uGSCO* (L) is the smallest language containing L and is closed under GSCO. That is, it is the 
smallest language K such that L C K and GSCO(K) C K . In other words, one starts with any pair of words in 
L and apply GSCO iteratively to any pair of words previously produced. All the obtained words are collected. 
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Definition 6 For a word w and a sub-string x of w we define the PreRx x (w), Suffix a; (w) as follows: 

Prefix x (u;) = {u : uxv! = w; u, v! G £*} 
SufHx x (w) = {s : s'xs = w; s, s' G £*} 

Pretix x (L) = Prefix x (w;) 
weL 

Suffix x (L) = (J Suffix* (iu). 

weL 



It is clear that 
Lemma 2 For any word w, 
Proof. Let 



w\ >f< W2 = Prcfix a; (uii) • x ■ Suffix^u^). 
Prefix* (Suffix x (w)) = Suffix* (Prefix* (w)). 

u G Suffix* (Prefix* (u>)). 

3m' G S* such that u'xu G Prefix* (w) 
<=r- 3u" G S* such that u'xuxu" = w 
<^ uxu" G Suffix* (w) 
•w- u G Prefix^Suffix*^)). 

Hence the proof. 

Lemma 3 For any three words W\, w 2l w 3 

(wi >f< w 2 ) >f< w 3 = wi >f< (u> 2 >f< w 3 ); 
i.e. £/ie operation 1GSC0 X is associative over the words. 
Proof. 

wi >f< i«2 = Prefix* (wi) • x • Suffix* (w 2 ). 

(wi >f< io 2 ) >f< W3 = [Prefix* (wi) • x • Suffix* (uj 2 )] >^< uj 3 

= Prefix* [Prefix* (wi) • x ■ Suffix* (u> 2 )] • x ■ Suffix* (1113) 

= [(Prefix* (Prefix* (wi) • x)) U (Prefix*(u>i) • x ■ Prefix* (Suffix* (u> 2 )))] • x ■ Suffix^u^) 
= [Prefix*(u>i) U Prcfix*(u>i) • x ■ Prefix*(Suffix*(w 2 ))] • x ■ Suffix*(u> 3 ) 

[form definition it follows that Prefix* (Prefix* (w) ■ x) = Prcfix x (w).] 
= Prefix x (wi) • x ■ Suffix*(w 3 ) U Prefix x (u;i) • x ■ Prefix* (Suffix* (u> 2 )) • x ■ Suffix*(u> 3 ) 

(7) 

On the other hand, consider 

w 2 >f< W3 = Prefix* (u> 2 ) • x ■ Suffix* (w 3 ). 

Wi >f< (w 2 >f< W3) = Prefix*(wi) • x ■ Suffix* [Prefix* (u> 2 ) • x ■ Suffix* (^3)] 

= Prcfix 2 ;(wi) • x ■ [Suffix* (Prefix* (u> 2 )) • x ■ Suffix*(u>3) U (Suffix*(:z; • Suffix* (w^)))] 
= Prefix x («;i) • x ■ Suffix* (Prefix* (w 2 )) • x ■ Suffix*(w 3 ) U Prefix*(u>i) • x ■ Suffix x (w3) 

[form definition it follows that Suffix*(a; • Suffix*(u;)) = Suffix x (w;).] 
= Prefix x (w;i) • x ■ Prefix*(Suffix*(w 2 )) • x ■ Suffix*(w 3 ) U Prefix*(u>i) • x ■ Suffix x (w 3 ) 

from the previous lemma (8) 

From (7) and (8), we have 

{wi >f< w 2 ) >f< w 3 = w 1 >f< (w 2 >f< w 3 ). 
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Note 2. Because of associativity of the operation we can write 

(»1 >f< w 2 ) >f< w 3 = w 1 >f< (w 2 >f< w 3 ) = w 1 >f< w 2 >f< w 3 . 
Corollary 4. For any languages L\, L 2 , F 3 we can write 

Li >f< (F 2 >f< ^3) = (Li >f< F 2 ) >f< F 3 . 

Proof. The result is obvious as 



Li >f< L 2 = (J (»i >f< W2). 



Lemma 4 For ant/ word u> G GSCO x (L), there exists a sequence of words w°, w 1 , ■ ■ ■ ,w s G F with s < 2 l — 1, 
smc/i t/iai 

w G w° >^< tu 1 >^< • • • >^< /. 

Proof. Let w G GSCO x (L). We apply induction on «. 

For i = 1; w G GSCO x {L), i.e. w G GSCO x (L, L). Hence, there exists two words wo, »i G F such that 
wGujo >f< w i- Note that w £ w >fc w for any w. Hence if w G L, we shall write w G w > s < w. 

Let the statement be true for each i = 1, 2, • • • , n. We want to show that it holds for i = n + 1 as well. 

Let w G GSCO n+1 (L). So there exists «/ and w" G GSCO n {L) such that w G w' >f< 10". By induction 
hypothesis, we can express 

w' G >^< mi >^< • • • >^< uv-i w cb w i> • ' ' £ ^ 

t«" G t(# >^ t< >^< • • • >S< tug, w'/, • • • G L. 

Hence 

«; € K >2< «/! >2< • ■ ■ >2< «/ 2 „_!) >^< K >2< < >2< • ■ ■ >2< 

By associativity we can write 

w £ w' >*< w[ >*<■■■ >*< vb-i >^< w'o -^w'l >*<■■■ >*< <„_!. 
So to can be generated by x-crossover of 2" +1 words (may not be distinct) of F. Hence the lemma holds. 
Theorem 4. For any three words wi, w 2 , w 3 over S* , 

[J (iui (102 >^ tus)) = IJ {(wi> 2 <w 2 )>^< W s). 

a,beS a,beS 

Proof. Using the Prefix.,; and Suffix^ notations mentioned earlier, we can write; 

w 2 >^C w 3 = Prefix;,(to 2 ) • 6 • Suffix;, (tw 3 ), 
Wi > s < w 2 = Prenx o (t0i) • a ■ Suffix o (t0 2 ). 

wi >^< (w 2 >^< w 3 ) = w\ >^< (Prefix;, (to 2 ) • b ■ Suffix;,(to 3 )) 

= Prcfix a (wi) • a • Suffix a Prefixb(w 2 ) • b • SuffbQ,(w 3 ) 

U Prefix a (wi) • a ■ Suffix a Suffix;, (to 3 ). (9) 



Similarly we get 



(wi w 2 ) w 3 = (Prefix a (toi) • aSuffix a (to 2 )) >k w 3 

= Prefix a (toi) • a • Prefix;, Suffix a (to 2 ) • b • Suffix;, (to 3 ) 

U Prcfix t Prcfix a (t(7i) • 6 • Suffix,,(to 3 ). (10) 



13 



The statement of the theorem can be restated as 

( |J Wl >^< (W 2 >^ W 3 )) |J ( |J Wl >^ (w 2 >k W 3 )) = 
a—b a^b 

(\J(w 1 >^<w 2 )>^<w 3 )\J(\J(w 1 >^<w 2 )> l ^w 3 ). 

a—b a^b 

That is to prove the theorem, it is enough if we prove 

( |J Wl >^< (w 2 >^ w 3 )) |J ( |J (Prefix a ( Wl ) • a • Suffix a Prefix b (u; 2 ) • b • Suffix b ( W3 ) 

a—b a^b 

\J Prefix. ( Wl ) ■ a • Suffix a Suffix b (ii; 3 ))) = ( |J ( Wl >^C w 2 ) ><k w 3 )) |J ( |J (PrenxfePrefix^!) • b • Suffix b ( W3 ) 

a—b a^b 

I^J Prefix a (wi) • a • Prefixf,Suffix a (ui 2 ) • b ■ Suffixf,(w 3 ))) (11) 

Let 

A = U wi ^ ( W2 ^ 

a — b 

= |^J{Prefix a (u>i) • a • Suffix a (u> 3 ) U Prefix a (w;i) • a • Suffix a Prefix a (?/;2) * a * Suffix a (^ 3 )} 

a 

C = U (wi >^< w 2 ) >^ w 3 

a—b 

= (^J{Prcfix a (wi) • a • Suffix a (u> 3 ) U Prefix a (wi) • a • Prefix a Suffix a (u> 2 ) • a • Suffix a (w 3 )} 

a 

Using equation 9, we define 

B\ = I) Prefix a (wi) • a • Suffix a Prefixb(u> 2 ) • b ■ Suffix b (w 3 ) 

B 2 = Prefix a (u>i) • a • Suffix a Suffixf,(w 3 ) 

Using equation 10, we define 

Di = I) Prefix a (wi) • a ■ Prefix6Suffix a (u; 2 ) • b • Suffix;, (103) 

a/b 

D 2 = I^J Prcfix;,Prefix a (wi) • b ■ Suffix;, (u> 3 ) 

a/b 

So from equation 11 it is sufficient to prove that 

AUB 1 UB 2 = CUDi UD 2 . (12) 

We pOrove next two lemmas which are required to prove equation 12. 
Lemma 5 B 2 C A; D 2 C C. 

Proof. We claim; given a word w, Suffix a Suffix;, (it;) C Suffix a (io) . 
Let 

u G Suffix a (Suffix b (w)) 

3u' G S* such that a'au £ Suffix;,^) 
=> 3u" G i7* such that u"bu'au = w 
=> w = (u"bu')au 
=> u G Suffix a (ui). 
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Similarly we can also prove that Prefix a Prefixf,(w) C Prcfix a (w). Note that the other way is not true in general. 
Therefore 

Prefix a (wi) • a ■ Suffix a Sumx;,(ui3) C Prefix a (wi) • a • SufRx a (w 3 ) 

C Prefix a (wi) • a • Suffix a (w 3 ) 

UPrefix a (w!) • a • Sumx a Prefix a (w 2 ) ■ a • SufRx a (w 3 ). 

Taking union on both sides over a ^ b we get B 2 C A. 
Similarly, we can prove that D 2 C C. Hence the proof of lemma. 

In Lemma 3, replacing x by single symbol a, we get 

A = C. (13) 
By Lemma 5, we have B 2 C A, and D 2 C C. Hence from equation 12, it is sufficient to prove that 

AUB 1 =CUD 1 . 

Lemma 6 For any word w, 

Prcfixf,(Suffix a (w)) = Suffix a (PrcfbQ,(w)). 
Proof. The proof follows the same line of argument as of lemma 2. Let 

u G Prcnxb(Suffix a (w)). 

3u' e S* such that ubv! E Suffix a (w) 
<^> 3u" E S* such that u"buau' = to 
<^> u"au <E Prefix(,(w) 

u 6 Suffix a (Prefix b (w)). 

Hence the proof. 

Using the Lemma 6 it is obvious that 

B 1 = Di. (14) 
Combining equations 13 and 14, we get our required result. 

Corollary 5. For any three words wi,w 2 ,W3 6 S* , 

Wi >T< (w 2 >T< W3) = (Wl >T< 102) >T< W 3 . 

Proof. By the corollary 1, we have 

W! >r< (w 2 >T< W 3 ) = Wl >T< ( (J W 2 >f< W 3 ) 

= \J w± >^< ( (J w 2 >f< w 3 ) 

bes aes 

= (J >^< (w 2 >f< 103). 

Similarly from the right hand side we get 

(wi >t< w 2 ) >t< w 3 = [J (wt >|< w 2 ) >f< io 3 • 

a,&e.E 

Using the previous theorem, we have the required equality. 

Theorem 5. Any word w e lGSCO l x (L) can be written as w e 1GSCO x (1GSCO j x (L), L) /or some j. 
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Proof. Let w £ lGSCO x (L). By the lemma 4 we get a sequence of words Wi,w 2 , • • ■ , to„+i £ L, such that w 
can be written in the form 

w £ wi >^< w 2 > X X ■■■w n >^< w n+ i 

w £ (w\ >^< w 2 >^< •••»„) >^< w n+ i by associativity 

.-. w £ lGSCO x (w 1 >^< • • • >^< w n , w n+1 ) 
^>w£ 1GSCO x (1GSC02{L),L). 

Hence the theorem. 

The above theorem suggests that the GSCO closure of L can be proposed in another form, which we can 
call the restricted GSCO closure of L. 

Definition 7 The restricted closure of GSCO denoted by rGSCO* (L) is defined recursively as follows:- 

rGSCO a (L) = L 
rGSCO l+1 (L) = rGSCO(rGSCO l (L),L) i > 1 

rGSCO* (L) = (J rGSCO^L) 

i>a 

The main difference between uGSCO* and the rGSCO* is that, in the later case, crossover takes place 
between a word produced so far by the crossover and a words which is in L. In the former case, the crossover 
takes place between any pair of words generated so far. Interestingly the following theorem tells us that, they 
generate the same language. 

Theorem 6. rlGSCO*{L) = ulGSCO*(L). 
Proof. By definition it follows that 

rlGSCO*(L) C ulGSCO*(L). 

Hence it is enough if wc show that 

ulGSCO*(L) C rlGSCO*(L). 
Let w £ ulGSCO*(L). Hence w £ uGSCO 1 ^) for some i. Hence there exists wo, w\, ■ ■ ■ , zt^-i £ L such that 

w £ {(■■■ ((too X Wl) X (w 2 X to 3 )) X • • • X (t0 2 i_ 2 X U> 2 i-l)) • • • )) 
=^w£wq X w\ X • • • X u> 2 ;_i Since X is associative 

W £ ( • • • ((t0 X tOl) X W 2 ) X • • • t0 2 "-3) X W 2 n- 2 ) X W 2 "-l 

=► to e GSCO( ■ ■ ■ GSCO(- ■ ■ {GSCO{GSCO{GSCO{wq, Wi ), w 2 ), w 3 ) ■ ■ ■ ),w 2i _ 2 ), w^^) 
^w£ GSCO(- ■ ■ (GSCO(GSCO(L), £,)••■),£,) 
^w£w£ rlGSCO^-^L). 

Hence the theorem. 

Note 3. We can also prove the above theorem by using closure property of rlGSCO* (L) under the 1-GSCO. 

Because of this theorem, we no more distinguish rGSCO* (L) and uGSCO*(L) and we simply refer them 
as GSCO*(L). The proof also shows that we can construct GSCO*(L) as follows 



GSCO n+1 (L) = GSCO(L,GSCO n (L)). 
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6 Regularity of GSCO 

Definition 8 (Base of a word) Base of a word w, denoted by B(w) is the minimal set of words whose iterated 
crossover generates w in a way that every element of B(w) takes part in GSCO at least once. 

B(w) = {ui,u 2 , ■■■ ,Uk :w € GSCO*({ui, u 2 , ■ ■ ■ , u k })}- 

Here the word 'minimal' is used in the sense that if there exists B'(w) = {u[, u 2 ,--- ,u' k , w G GSCO*({u[, u' 2 , 
■ ■ ■ ,u' k })} such that B'(w) C B(w), then B(w) = B'(w). 

B(w) is the set of minimal words to generate w by the process of GSCO. B(w) is a finite set for any word w. 
B(w) need not be unique for a word w. For an example B(abbbc) = {ab, bb, be} and B(abbbc) — {abb, be}. B(w) 
will be called nB(w) if all the words of B{w) are of length n. nB(w) with n > 2 is not unique. As an example, 
the word w = abbbc has two 4i3 sets which are 4:B(abbbc) = {abbb,bbbc} as well as {abba,bbbc}. For words w 
such that \w\ — 1, w G lB(w). It is interesting to note that 2B(w) is unique for a word. For w — a\a 2 ■ ■ -a k , 
a t G S, i = 1, 2, • • • , k 

2B(w) = {aia 2 ,a 2 a 3 , • • • , a k -ia k }. 

For a e £ , 2B(a) is taken as the set {a} and 2£>(e) = e. We define 2B of a language L as 2B(L) = Li w( zi J 2B(w). 
For example, 2B(a + ) = {a,aa}. 

Theorem 7. For a language L, GSCO*(L) is a regular language. 

Proof. Let S be the alphabet of L. We define a relation R over S* x S* such that 

xRy iff 2B(x) = 2B(y); S 1 (x) = 27i(y); S\ x \(x) = E\ v \{y)* 

Claim 1 : R is a right invariant (with respect to concatenation) equivalence relation. 

R is reflexive, since xRx. R is symmetric since xRy yRx. If xRy and yRz, we have 2B(x) = 2B(y) ; E\ (x) — S\ (y) 
and E\ x \(x) = E\ y \{y) and 2B(y) = 2B(z);E\(y) = E\(z); andS\ y \{y) — E\ z \{z). Hence, we have 2B(x) = 
2B{z); E\(x) — Ei(z); E\ x \{x) = E\ z \{z) implies the transitivity of R. Hence R is an equivalence relation. 

Let xRy. So 

2B(x) = 2B(y); E^x) = E^y); E M (x) = E M (y). (15) 

Let z be any word. 

2B{xz) = 2B{x) U {E M (x) ■ E x {z)} U 2B(z). 

Similarly 

2B(yz) = 2B(y) U {E M (y) ■ E^z)} U 2B(z). 

By 15 we have 

2B{xz) = 2B(yz) ; E 1 (xz) = E 1 (yz); E\ x \ + \ z \{xz) = E\ y \ + \ z \{yz), 

which implies xzRyz. Hence R is a right invariant with respect to concatenation. 

Claim 2 : Number of equivalence classes of R over E* is finite. 

Every equivalence classes of E* will have a 2B set, a symbol s e E and a symbol e G E such that the 
elements in the equivalence class are just the elements of GSCO*(2B) n s.E*.e. Every equivalence class is 
parameterized by a 2B set, a symbol s (which is the starting symbol of the words in that equivalence class) 
and the symbol e (which is the ending symbol of the words in that class). We denote an equivalence class by 
(s, 2B, e), 2B € 2 s \0, s, e € E. For example, if E = {a, b}, abbbb will be in the equivalence class (a, {ab, bb}, b). 
The words w G E* such that |ty| = 1, will be related to itself under the relation R and not to any other words 
other than E* . That is, these words will be in the equivalence class in which only one word w will be present. 



* Ei(w) is the symbol in the i th position of the word w. 
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The word 'a G E' will be present in one equivalence and no other element will be present in that equivalence 
class. Similarly, the clement '6' will be present in one equivalence class. Wc denote the equivalence classes which 
has only one element of length one by (a, {a}, a), a G E. The word e E E* will be in an equivalence class which 
will not have any other element of E* in it. Thus we have two categories of equivalence classes. 
Category I : (s, 2B, e), 2B G 2 E \%, s, e G E. 
Category II : (a, {a}, a), a EE. 

For every equivalence class of Category I, we have the triple (s,2B,e), 2B G 2 s \0, s,e G E. For every triple 
(s,2B,e), 2B G 2 s \0, s,e G 17, we have an equivalence class of R ( some equivalence classes of R over E* may 
be empty). That is, the triple (s,2B,e) characterizes an equivalence class of R. If \E\ = n 1 \2 E \0| = 2™ — 1. 
The number of such triples will be (2™ — 1) x n 2 . That is under category I, the total number of equivalence 
classes of R over E* will be n 2 (2 n — 1). Under category II, the number of equivalence classes will be the number 
of triples of the form (a, {a}, a), a G E, a G Z 1 U {e}. Under category II, the total number of equivalence classes 
are n + 1 . The total number of equivalence classes of R will be n 2 (2™ — 1) + (n+ 1) , which is finite since n is finite. 

Claim 3 : GSCO* (L) is the union of some of the equivalence classes of R. 

Since GSCO* (L) C E* , the elements of GSCO* (L) will be spread out in different equivalence classes of 
R over E*. e <£ GSCO*(L). If the symbol a G E such that a G GSCO*(L), then a will be present in the 
equivalence class (a, {a}, a) and no other element other than 'a' will be present in (a, {a}, a). So the equivalent 
classes of category II will be contained in GSC0*(L) if a G GSC0*(L). 

We prove the following claim to show that, if there is an equivalence class of category I which shares at least 
one common word with GSCO*(L), then that equivalence class will be fully contained in GSCO*(L). 

Claim 3(a): If GSCO*{L) n {s, 2B, e) + 0,for some s, e, 25, then (s, 2B, e) C GSCO*{L). 

We have to prove (s, 2i3, e) C GSCO* (L). Suppose the other way. That is, there exists a word w such that 
H > 1, w G (s,2B,e) and w £ GSCO*(L). Since w G (s,2B,e) we have w £ GSCO*(2B) n s.£*.e. Let 
u> = ai<22 • • • «„, w > 1. Here s = Oi; e = a„. u> G ai<22 02^3 >^ ■ ■ ■ >^ a„-ia n , aiOj+i G 2_B, i = 
1,2,3 - • • , n — 1. We want to show that there exists a sequence of words in GSCO*(L), which by iterative 
crossover can generate w. The following claim helps us to get such a sequence of words. 

Claim 3(b): 

1. There exists words u>i — u,a,aj+i«j G GSCO* (L), for some U{, V{ G E*. 

2. There exists a word w\ G GSCO* (L) such that a\a 2 G Prefix(wi). 

3. There exists a word w n -i G GSCO*(L) such that a n -\a n G Suffix(w n _i). 

Elements of 2B (which is under consideration in Claim 3(a)) are in 2B(GSCO* (L)). That is, there exists a 
word of the form uaiazv G GSCO*(L). 

Since the first symbol of w is ai, s = a\, there exists a word ait G GSCO*(L), t G E*. Since GSCO*(L) 
is a crossover language and a\t,ua\a2V G GSCO*(L), a\aiv G ait >— < ua\a2V G GSCO* (L). We write 
u>i = aia 2 w G GSCO*(L). Similarly, there exists a word u>„_i = i/a„_ia„ G GSCO* (L), for some w' G 

The set 2£> (which is under consideration) contains all the sub words of length 2 of some words in GSCO* (L). 
(that is, the set 2B contains all the sub words of length 2 for the words which are present in the equivalence 
class (s, 2B, e). For each 0^+1 G 2B, i = 1, 2, 3 ■ • • , (n — 2), there exists a word Wi — Uiaia^Vi G GSCO*(L), 
for some Ui,Vi G E* (u>-s need not be distinct). Thus we have a sequence of words Wi G GSCO* (L). Thus we 
have the claim 3(b). 

Clearly 010203 •• ■ a n G >— < ^20203 • ■ • >— < v'a n -ia n . That is, w G u>i >— < W2 >— < ■■■w n ,Wi G 

GSCO*(L). Thus, to G GSCO*(L), which contradicts with the assumption that to ^ GSCO*(L). Hence, we 
have the claim 3(a). 

Thus, we have , for every a G GSCO* (L) (such that a G 17), the equivalence class of category II which 
contains a, viz., (a, {a}, a) will be fully contained in GSCO* (L) since {a, {a}, a) contains only one element a. 
For every w G GSCO* (L), \w\ > 1, the equivalence class (of category I) which contains w, viz., (s,2B,e) will 
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be fully contained in GSCO*{L). Thus, 



GSCO*(L) = ( |J (a,{a},a))\J( 



u 



>,2B,e». 



aeGSCO'(L)nS 



w£GSCO»(L)n(s,2B,e) 



We know that R is of finite index. Hence, GSCO*(L) is the union of some of the equivalence classes of a 
right invariant equivalence relation of finite index. Thus, by Myhill - Nerode theorem, GSCO* (L) is regular. 

The converse of this theorem is not true, i.e. not all regular language can be obtained by using GSCO. We 
give a counter example in example 7. 

Definition 9 A language L is said to be a crossover language if there exist a set L' such that GSCO* (L') = L. 
That is, L is called an crossover language if L can be got by the iterated GSCO process of some set L' . 

Example 6. 1. L = {a, b} is a crossover language GSCO* (L) = {a, b}. 

2. L = a + b + is a crossover language since GSCO*({aabb, aaabbb}) = a + b + 

Remark 1 All crossover languages are regular and no crossover language will contain £ (word of length 0) 

Theorem 8. A language L is said to be a crossover language if and only if L is closed with respect to the 
operation GSCO. 

Proof. Given L is a crossover language. Then there exist a language L' such that GSCO* (V) = L Let x,y G L. 
Thenx,y G GSCO*(L'). GSCO{x,y) G GSCO*(L') since GSCO*{L') is the transitive closure of GSCO. Hence, 
GSCO(x, y) G L, since GSCO*(L') = L. 

The other way proof : 

Suppose L is closed with respect to GSCO. GSCO(x,y) G L, for every x,y G L. GSCO{GSCO(x,y), z) G 
L,Mx,y,z G L. That is, GSC0 2 (L) C L. Continuing like this, we have GSCO l (L) C L,i > 0. Then, 
UiGSCO^L) C L, and L C \J t GSCO l {L). Hence GSCO*(L) = L which implies that Lisa crossover lan- 
guage. 

Example 7. The language L — {a 2n : n > 1} is a regular language. However it is not a GSCO language as it is 
not closed under GSCO operation, a 3 G a 2 X a 2 but a 3 £ L. 

Theorem 9. For any crossover language L, there exists three finite sets, S, E C S, B C S 2 such that 



Proof. Given a crossover language L, L will not contain e. Since L is regular, we can find a right-linear grammar 
G = (N,T,P,S) such that G generates L. Without loss of any generality, let G be a grammar without e - 
productions (since L does not have e), unit productions and any useless symbols. We construct a set B (called 
the Base set of L) as follows. 

1. For every production S — > a G P,a G T; include a E B. 

2. For every pair of productions X — ► aA, A — > &_B G P; a, 6 G T; A,B,IeJV; include afr G £?. 

3. For every pair of productions X — ► aA, A^fteP; a, 6 G T; A,l£jV; include afe G £>. 

The construction of tells that the set B contains all the sub words of length 2 of L. We construct the set 
5(Start symbol set) and ^(end symbol set)as follows. 

1. For a production S — > a, include a G S 

2. For a production 5 — ► aA, include a G 5 

3. For a production A — > a, include a <E E 



L = (GSCO*(B) n S*^*^) (J(L n Z 1 ) 
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S and E will have the first and the last symbol of the words of L. 

Case I: All the words in L of length greater than or equal to 2 arc in GSCO*{B) n SS*E and vice- versa. 
Part I : Let w = a\a 2 . . .a n E L, \w\ > 2. Then w E a\a 2 >t< a 2 a 3 >t< . . . >t< a„_ia„. Since w E L, a\ G S 
and a n E E. Here, a,ia,i+i,i = 1,2, ... (n — 1), are the sub words of L of length 2 . This implies cua-i+i E B,\/i. 
w E GSCO*(B) and w E SS*E. Thus, we have w E GSCO*{B) n 

Part II: Let w E GSCO*(B) n SS*E. Let to = aia 2 ...a„,ai G 5 and o n G E. w E a 1 a 2 >r< a 2 az >\ 
— < . . . >t< dn-io-n-, a>iO>i+i G B, 1 < i < n — 1. We claim now that there exists a word wi E L such that 
a i a 2 G Prefix(u>i). Here, aia 2 G £?. Since -B contains all the sub words of L of length 2, there exists a word 
of the form ua\a 2 v E L,u,v E S* . Since a! G S, there exist a word a\t G £. Further, o^i >x< ua\a 2 v C L. 
That is, a\a 2 u G L Thus, we have the claim of the existence of Wi G L whose prefix is aia 2 . Similarly, 
we can prove that there exists a word w„_i = m„_ia„in L. Since £? contains all the sub words of L of 
length 2, for each ajOj+i, 'i = 2, 3, ... (n — 2), there exists a word = Uididi+iVi G i. Thus, we have a se- 
quence of words Wi E L,i ~ l,2,...n. Clearly, ai . . . a„ G a\a 2 u >r< u 2 a 2 a3« 2 >r< • • • va n -ia n . That is, 
w E wi >K w 2 . . . >i< Jfl n , wi,w 2 , . . . w n E L. Since L is a crossover language, w E GSCO* (L) C i. Hence, 

t»el. 

Case II: All the words in L of length equal to 1 are in L n and vice-versa. 

Since GSCO* (B) n SS*E contains only words in L of length > 2, it is clear that the words wGlof length 
1 are in L fl X". 
Hence 

L = (G5CO*(B) n SE*E) \J(L n X) 
Corollary 6. If Lr\S = % (that is, L does not contain any word of length 1), then L = GSCO*(B) n SE*E 
Proof is immediate. 

Corollary 7. Let E be an alphabet. If all the words in L are of the form SE*S, (that is, if L contains words 
which starts with all the possible symbols and ends with all the symbols, then L = GSCO* (B) 

Proof is immediate since S — S = E 

Given a crossover language L, the above theorem gives the construction of the set B with which one can generate 
L by the iterative GSCO. The base set of a crossover language L will have all the sub words of L of length 2 
along with the words of length 1 in L where as the 2B set of L will contain all the sub words of L of length 2, 
words of L of length 1 and e (word of length 0) if s is in L. In other words, if L does not contain e, then the 
base set of L and 2B(L) will be the same. The next lemma shows that the base set of a crossover language is 
unique. 

Lemma 1. The base set of a crossover language is unique. 
Proof is obvious. 

7 Comparison with other sub-regular families 

Since GSCO* (L) is a subclass of regular languages, in this section, we compare the various subclasses of regular 
languages with the crossover language. For this purpose, we consider different classes of crossover languages as 
follows. 

Definition 10 We define the following classes of crossover languages based on R, the set of overlapping. Let 
S be the alphabet of the axiom. 

TSyGSCO Class of languages that can be generated by the operation GSCO* R over an axiom, where R = S . 
SyGSCO Class of languages that can be generated by the operation GSCO* R over an axiom, where RES. 
StGSCO Class of languages that can be generated by the operation GSCO* R over an axiom, where R C S + . 
TStGSCO Class of languages that can be generated by the operation GSCOr over an axiom, where R = S + . 
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Theorem 10. 1. TSyGSCO C SyGSCO 

2. SyGSCO C StGSCO 

3. TStGSCO c StGSCO 

I TStGSCO = TSyGSCO 

Proof. Let L G TSyGSCO. Then there exists a set R and a set L such that L = GSCO R (L ), where R is the 
alphabet of L . That is, L is generated by the crossover where all the overlapping are over the symbols of the 
alphabet of L. We have L = GSCO* r (Lq), R C alphabet of Lq, which implies L G SyGSCO. The other way is 
not true. The language a+b 2 G SyGSCO, because a+b 2 = GSCO a (a+b 2 ). But, a+b 2 £ TSyGSCO, since the 
language a + b 2 is not closed w.r.t the operation GSCOb- 

Let L G SyGSCO. Then there exist a set R and a set L such that L = GSCO* R (L ), R C Zl. Since 
R C S L , R C S L *. This implies L G StGSCO. The other way is not true. (aa)+6 2 (aa)+ G SyGSCO with 
respect to ii = {b 2 }. However this language is not closed with respect to any symbol. 

Let L G TStGSCO. Then there exist an L ,R such that L = GSCO* R L , where R = sub(L ). Since 
R C S* Lq , L G SyGSCO. The other way is not true. The language a+b 2 G StGSCO R where R = {b 2 }, but it is 
not in TStGSCO. 

Immediate from corollary 1. 

We examine now the relationships of class of GSCO languages with a series of well-known subfamilies of 
REG, considered in [18, 26, 19]. 

Definition 11 A language L C S*is called 

Combinational if and only if L = E*U , for some U C S; 

Definite if and only if L = L\ U E*L2, where L\, L2 are finite subsets of S* ; 

Nil-potent if and only if either L or S* — L is finite; 

Commutative if and only if x G L implies that all permutations of x are in L; 
Suffix-closed if and only ifSuSix(L) C L) 

Non-counting (extended star-free) if and only if there is an integer k > 1 such that for every x,y, z G 

S* , y ^ e, we have xy k z G L 
Power-separating if and only if for each x G S* there is a natural number m > 1 such that either L fl 

{x n \ n > m } = or {x n \n > m} C L. 
Ordered if and only if L is accepted by some deterministic finite automaton (K,S,S,s ,F) with a totally 

ordered set of states K, such that for each a G S, the relation s < s' implies S(s, a) < 5(s', a). 

We denote by COMB, DBF, NIL, COMM, SUF, ESF, PS, ORD the families of combinational, definite, 
nilpotent, commutative, suffix- closed, non-counting, power-separating and ordered languages. The relation be- 
tween the different type of GSCO classes and the above sub classes are given in the figure 3. 

Theorem 11. SH = SyGSCO; NCH = StGSCO. 

Proof. We have to show that a language L which can be generated by a simple splicing system can also be 
generated by StGSCO and vice versa. For that, it is enough if we show that, for any axiom A, there exists a 
language L such that 

a*{A) = GSCO* R {L ), 

for some R and vice versa. We use the method of induction. Consider a R (A), R is the subset of the alphabet of A 
where a is the splicing scheme of a simple splicing system. Let Lq = A. x G L implies x G a R (A), i > 0. If i = 0, 
x G A = L ■ We assume that for an i > 0,, y G o- R (A), we have that y G GSCO* R (L ). Let w G cr^ 1 (A). Then, 
there exist wi,W2 G o~ l R (A) such that (wi, UI2) \~ a w, a G R. That is, w\ — u\au2, W2 — v\av2, w = Uiav2- By the 
process of induction, Uiau2,v\av2 G GSCO r (Lq). Then GSCOn{uiau2, V\av2) — u±av2 = w G GSCO* r {Lq) 
Hence, a* (A) C GSCO R (L ). 

On the other hand, consider GSCO* R (L ) where R is a subset of the alphabet of L . Let w G GSCO* R (L ). 
So w G GSCO R (L ) for some i. Let us put L = A. 

If i = 0, then w G o- R (A) holds trivially. Using the induction hypothesis, assume that any y G GSCO l (L ), 

y e °r(A). 
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Let w G GSCO i+1 (L ). So, w G GSCO(wi,w 2 ), where wi,w 2 G GSCO i {L ). So there exists an a G i? such 
that u> G GSCOr{w\, w 2 ), a £ R i.e. wi = u\au 2 , w 2 — v\av 2 and u> = u\av 2 . 

By the process of induction, both w\ £ o-* R {A) and W2 G cr^(A) implies w G cr^(yl) since {wi,w 2 ) h a 
Hence GSCO* R (L ) C cr*(A), which proves Si? = SyGSCO. Similarly, we can prove TVCii = StGSCO. 

Theorem 12. L G T SyGSCO if and only if L is closed with respect to the the operation GSCO a Va G El 
(alphabet of L). 

Proof. Let L G TStGSCO. Hence there exists a set L such that L = GSCO* R (L Q ), R = E. This implies, L is 
closed with respect to GSCO a , Va G El- 

Let L is closed with respect to the operation GSCO a , Va G E, i.e. GSCO a (x, y) G L Vx, y E L and Va G 17. 
This implies GSCO\{L) C L. Now GSCO a (GSCO a (x, y), z) eLVi,y,z£ L, i.e. GSC0 2 a {L) C i. 

Continuing on the same line, we get 

(L) C L 
=^UGSCOi(L)CL. 

Since we have L C [J . GSCO l (L), we conclude 

GSCO*(L) = L, 

i.e. L is a TStGSCO. 

Theorem 13. L G StGSCO if and only if3R G -E^ sucft i/iai L is closed with respect to the operation GSCOr. 

Proof. L G StGSCO, implies 3 a set L and R such that 

L = GSCO R (L Q ), REE* l . 

Hence L is closed with respect to the operation GSCOr. 

For R C 172. Let L be closed with respect to the operation GSCOr. So, 

Vx,yEL GSCO R (x,y) = z G L. 

i.e. GSCO(GSCO R (x, y),z)C L, Vx, y, z E L 
i.e. GSC0 2 (L) C L. 

Continuing on the same lines, 

GSCO'(L) CL, Vi > 
|jG5COi,(L)CL 

GSCO* R (L) C L 

Hence, we can conclude that 

L = GSGOJj(L). 

Head has proved that NCH=SLT [13]. Thus we have the following theorem whose proof is immediate. 

Theorem 14. L G SLT if and only if there exists R C E* L such that L is closed with respect to the operation 
GSCOr. 

Theorem 15. SLT C ESF 

Proof. Let L G SLT. Then, there exist k,w G E k such that w is a constant for L. That is, xwy,pwq G -L implies 
xwq,pwy G £. Consider xy l z G L.Then,xyy k z,xy k yz G £ implies that xy k+2 z G £. Hence xy' +1 z G £ which 
implies L G i?SF. Hence SLT G i?SF. But the converse is not true. The language {abb + c, pbb + q} is ESF, but 
for no k > 1 , SLT property holds. 
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Fig. 3. Relations between different subclasses of the regular language and their relations with GSCO 

Theorem 16. The relations in figure 3 hold; The arrows indicate strict inclusions and every two families not 
linked by a path in this diagram are incomparable. 

Proof. This diagram appears in [19] with out the GSCO classes. Hence all relations between families other than 
GSCO classes are known. 

1. COMB c TSyGSCO. Let L 6 COMB, i.e. L = E*U, U C S. E*U is closed with respect to the GSCO 
operation. This implies E*U is a crossover language. Hence L G TSyGSCO. 

This inclusion is strict, a* b* G TSyGSCO but £ COMB. 

2. DEF and TSyGSCO arc incomparable. 

ab+ G TSyGSCO — DEF. (a + b)+aabb G DEF-TSyGSCO. Since {a + b)+aabb is not a crossover language. 

3. TSyGSCO and NIL are incomparable. 

{a 2 , a 3 } G NIL - TSyGSCO. a*b* G TSyGSCO - NIL. 

4. TSyGSCO and COMM are incomparable. 

{a6, bo) G COMM - TSyGSCO. The other way is obvious. 

8 Conclusion 

We have presented a new operation GSCO over words and languages, which in some sense abstracts the cross- 
over of chromosomes in the living organisms. This study of GSCO reveals many interesting results, such as 
GSCO*(L) is regular for any L. This result could be useful in places where a generation of regular languages 
are required. 

we conclude this paper by pointing out some further directions of research. A study of generalised parallel 
cross over of words and languages, where the parallelism is allowed, (i.e. cross over may occur more than one 
places) can be initiated and a comparison between the generalised sequential crossover and generalised parallel 
crossover has the potential of bringing results of worth. 

Though this study has come out with a characterisation of strictly locally testable languages (SLT) in terms 
of GSCO, this result does not compare the characterisations of SLT, which are available earlier with the newly 
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obtained one. That is, the characterisations of SLT could be compared in the sense of complexity, which is worth 
investigating. 

In our opinion the construction of B set can be used for data compression in the following sense. To store a 
crossover language L, which is closed under GSCO, it is sufficient to store the sets B, S, E . L can be retrieved 
from these by iterated GSCO operation. 
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